fix(mcp): resolve SSE connection deadlock with asyncio.Lock by steveant · Pull Request #909 · coleam00/Archon

steveant · 2025-12-20T04:02:09Z

Summary

Replace threading.Lock with asyncio.Lock in the MCP server lifespan manager to prevent event loop blocking when multiple SSE connections are made concurrently.

Problem

The original code used threading.Lock which blocks the entire event loop thread. When yielding while holding the lock, other connections could not proceed, causing the server to become unresponsive after 2+ concurrent SSE sessions.

Solution

Use asyncio.Lock (yields to other coroutines instead of blocking)
Yield context outside the lock block
Capture context in local variable to prevent race conditions
Add explicit None guard before yield
Add documentation comments explaining design decisions

Testing

Tested with 20 concurrent connections:

All requests completed successfully (< 0.5s total)
Single initialization confirmed via logs
No deadlock or timeout observed

Notes

asyncio.Lock is event-loop bound, safe for single Uvicorn worker (FastMCP's intended deployment)
Sync init calls inside lock don't cause deadlock but may delay concurrent connections under heavy load

Summary by CodeRabbit

Refactor
- Enhanced internal initialization handling to improve startup reliability under concurrent connection scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Replace threading.Lock with asyncio.Lock in lifespan manager to prevent event loop blocking when multiple SSE connections are made concurrently. Problem: - threading.Lock blocks the entire event loop thread - Yielding while holding the lock prevented other connections - Server became unresponsive after 2+ concurrent SSE sessions Solution: - Use asyncio.Lock (yields to other coroutines instead of blocking) - Yield context outside the lock block - Capture context in local variable to prevent race conditions - Add explicit None guard before yield Tested with 20 concurrent connections - all passed, single init confirmed.

coderabbitai · 2025-12-20T04:02:20Z

Walkthrough

The initialization synchronization mechanism was refactored to replace a threading-based lock with an asyncio-compatible lock. The lifespan initialization flow now uses an async context manager with a double-check pattern, captures the shared context in a local variable, and yields it outside the lock to avoid race conditions during context access.

Changes

Cohort / File(s)	Summary
Async initialization lock refactor `python/src/mcp_server/mcp_server.py`	Added asyncio import; replaced `threading.Lock()` with `asyncio.Lock()` for the module-level `_initialization_lock`; updated lifespan initialization to use async context manager with double-check pattern; introduced local `ctx` variable to capture `ArchonContext` and prevent race conditions; moved context yield outside the lock with guard to raise on initialization failure; maintained shared context reuse and added cleanup logging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12–18 minutes

Verify that asyncio.Lock() is the correct synchronization primitive for the async context
Confirm the double-check pattern correctly prevents race conditions without introducing new ones
Ensure the local ctx variable properly captures the ArchonContext and that the guard condition raises appropriately on initialization failure
Check that moving the yield outside the lock does not reintroduce timing vulnerabilities

Poem

🐰 A threading lock, so old and bold,
Gave way to async, swift and true—
With double-checks and context caught,
No race conditions slip right through!
The shared realm waits outside the gate,
Where locks release and tasks align. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: replacing threading.Lock with asyncio.Lock to fix an SSE connection deadlock issue.
Description check	✅ Passed	The description covers the main sections (Summary, Problem, Solution, Testing, Notes) but is missing several template sections like Type of Change, Affected Services, Testing checklist items, and formal checklist verification.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

python/src/mcp_server/mcp_server.py (1)
163-176: Synchronous calls inside async lock may delay concurrent connections.

The comments correctly note that synchronous calls to get_session_manager() and get_mcp_service_client() inside the lock don't cause deadlock (asyncio.Lock yields control), but they may delay concurrent connection attempts during initialization.

For current usage this is acceptable, but if these become I/O-bound bottlenecks under high concurrent load, consider wrapping them with asyncio.to_thread() or loop.run_in_executor().
🔎 Optional optimization if initialization becomes a bottleneck

If profiling shows these sync calls are slow:
                # Initialize session manager
-                # NOTE: These sync calls run inside the lock. They don't cause deadlock
-                # (asyncio.Lock yields to other coroutines), but slow init here delays
-                # concurrent connection attempts. Consider run_in_executor if these
-                # become I/O-bound bottlenecks under load.
                logger.info("🔐 Initializing session manager...")
-                session_manager = get_session_manager()
+                session_manager = await asyncio.to_thread(get_session_manager)
                logger.info("✓ Session manager initialized")

                # Initialize service client for HTTP calls
                logger.info("🌐 Initializing service client...")
-                service_client = get_mcp_service_client()
+                service_client = await asyncio.to_thread(get_mcp_service_client)
                logger.info("✓ Service client initialized")
Note: Only apply if these functions are I/O-bound and measurements show they're bottlenecks.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecaece4 and 26beb62.

📒 Files selected for processing (1)

python/src/mcp_server/mcp_server.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

python/src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

python/src/**/*.py: Complete batch processing, background tasks, and external API calls but log detailed errors for each failure instead of crashing
Include context about what was being attempted, preserve full stack traces with exc_info=True in Python logging, use specific exception types, include relevant IDs/URLs, never return None to indicate failure, and report both success count and detailed failure list for batch operations
Use Ruff for linting and MyPy for type checking with 120 character line length in Python backend
Use Python 3.12 with 120 character line length in backend code
Import GITHUB_REPO_OWNER and GITHUB_REPO_NAME constants from python/src/server/config/version.py as the single source of truth for repository configuration

python/src/**/*.py: Python backend code should use 120 character line length
Use Ruff for linting Python code to check for errors, warnings, and unused imports
Use Mypy for type checking in Python to ensure type safety

Files:

python/src/mcp_server/mcp_server.py

python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Error messages must include context about what was being attempted, preserve full stack traces, use specific exception types, include relevant IDs/URLs, and report both success count and detailed failure list for batch operations

Files:

python/src/mcp_server/mcp_server.py

🧠 Learnings (1)

📚 Learning: 2025-11-29T19:24:46.265Z

Learnt from: CR
Repo: coleam00/Archon PR: 0
File: archon-example-workflow/CLAUDE.md:0-0
Timestamp: 2025-11-29T19:24:46.265Z
Learning: Check if Archon MCP server is available before doing anything else in task management scenarios

Applied to files:

python/src/mcp_server/mcp_server.py

🧬 Code graph analysis (1)

python/src/mcp_server/mcp_server.py (1)

python/src/server/utils/progress/progress_tracker.py (1)

error (196-219)

🔇 Additional comments (6)

python/src/mcp_server/mcp_server.py (6)

17-17: LGTM: Asyncio import added correctly.

The asyncio import is necessary for asyncio.Lock() usage and is properly placed with other standard library imports.

144-148: LGTM: Optimistic check pattern is efficient.

The quick check without acquiring the lock (lines 144-148) optimizes the common case where initialization is already complete, allowing subsequent SSE connections to proceed immediately.

The explicit is not None check is good defensive programming.

150-158: LGTM: Double-check pattern correctly implemented.

The double-check locking pattern is properly implemented:

Local variable ctx avoids race conditions between lock release and yield

Type hint ArchonContext | None is explicit

Async lock context manager is correct

Double-check inside lock prevents duplicate initialization

185-192: LGTM: Atomic assignment and comprehensive error handling.

The context is assigned to _shared_context last (line 186) for atomicity, and the flag is set after successful initialization (line 187). Error handling properly logs the exception with full traceback per coding guidelines.

194-203: LGTM: Yielding outside lock prevents blocking—key fix for deadlock.

This is the critical fix: yielding the context outside the lock (line 200) ensures other concurrent connections aren't blocked while this connection is active.

The None guard (lines 197-198) is good defensive programming, though unlikely to trigger in normal operation since initialization either succeeds or raises.

67-70: Excellent fix addressing event-loop blocking with clear documentation.

The change from threading.Lock to asyncio.Lock correctly resolves the event-loop blocking issue that caused SSE connection deadlocks. The inline documentation clearly explains the event-loop binding constraint and deployment assumptions.

Single-worker deployment is enforced by default—neither the Dockerfile.mcp nor docker-compose.yml configuration specifies worker processes, so Uvicorn runs with a single worker. The code comment provides clear guidance for any future multi-worker scenarios.

- Add supabase-network to archon-agent-work-orders service (required for local Supabase CLI connectivity) - Fix MCP SSE deadlock by using asyncio.Lock instead of threading.Lock (PR coleam00#909 submitted upstream) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

steveant · 2026-01-15T03:07:37Z

Superseded by #928 which consolidates this fix along with other improvements.

coderabbitai Bot reviewed Dec 20, 2025

View reviewed changes

steveant closed this Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): resolve SSE connection deadlock with asyncio.Lock#909

fix(mcp): resolve SSE connection deadlock with asyncio.Lock#909
steveant wants to merge 1 commit intocoleam00:mainfrom
AeyeOps:fix/mcp-asyncio-lock-deadlock

steveant commented Dec 20, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 20, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

steveant commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steveant commented Dec 20, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

steveant commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

steveant commented Dec 20, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 20, 2025 •

edited

Loading